Join Size Estimation Over Data Streams Using Cosine Series
نویسندگان
چکیده
In many applications, data takes the form of a continuous stream rather than a persistent data set. Data stream processing is generally an on-line, one-pass process and is required to be time and space efficient too. In this paper, we develop a framework for estimating join size over the data streams based on the discrete cosine transform (DCT). The DCT generally can provide concise and accurate approximations to data distributions and its coefficients can be updated easily in the presence of insertions and deletions. These features make the DCT suitable for dynamic data stream environments. We have performed analyses and conducted experiments to investigate the applicability of the cosine transform to data streams. The experimental results show that given the same amount of storage space, our method yields more accurate estimates most of the time than the sketch-based methods, which have become the main methods for approximate query processing over data streams. The experimental results have also confirmed that the cosine series can be updated quickly to cope with the rapid flow of data streams. Keyword: Data Stream, Cosine Series, Query Estimation
منابع مشابه
Selectivity Estimation over Multiple Data Streams using Micro-clustering
Selectivity estimation is an important task for query optimization. We propose a technique to perform range query estimation over multiple data streams using micro-clustering. The technique maintains cluster statistics in terms of micro-clusters and cosine series for all streams. These microclusters maintain data distribution information about the stream values using cosine coefficients. These ...
متن کاملProcessing Data-Stream Join Aggregates Using Skimmed Sketches
There is a growing interest in on-line algorithms for analyzing and querying data streams, that examine each stream element only once and have at their disposal, only a limited amount of memory. Providing (perhaps approximate) answers to aggregate queries over such streams is a crucial requirement for many application environments; examples include large IP network installations where performan...
متن کاملSelectivity estimation of range queries in data streams using micro-clustering
Selectivity estimation is an important task for query optimization. The common data mining techniques are not applicable on large, fast and continuous data streams as they require one pass processing of data. These requirements make Range Query Estimation (RQE) a challenging task. We propose a technique to perform RQE using micro-clustering. The technique maintains cluster statistics in terms o...
متن کاملStream Window Join: Tracking Moving Objects in Sensor-Network Databases
The widespread use of sensor networks presents revolutionary opportunities for life and environmental science applications. Many of these applications involve continuous queries that require the tracking, monitoring, and correlation of multi-sensor data that represent moving objects. We propose to answer these queries using a multi-way stream window join operator. This form of join over multise...
متن کاملAnalytical and Experimental Evaluation of Stream-based Join
Continuous queries over data streams have gained popularity as the breadth of possible applications, ranging from network monitoring to online pattern discovery, have increased. Joining of streams is a fundamental issue that must be resolved to enable complex queries over multiple streams. However, as streams can represent potentially infinite data, it is infeasible to have full join evaluation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007